Introduction
Peripheral T-cell lymphomas (PTCL) encompass a heterogeneous group of uncommon diseases with indolent or aggressive clinical course. PTCL-not otherwise specified (PTCL-NOS) are among the most frequent PTCL entities. They include a heterogeneous group of lymphomas lacking clear diagnostic criteria for distinct subtypes, and they are ultimately designed by exclusion (Alaggio et al., Leukemia 2022). Nevertheless, a more granular stratification is achieved by reclassifying morphologically diagnosed PTCL-NOS cases using gene expression data (Iqbal et al., Blood 2014). However, a significant challenge remains in accurately diagnosing T-cell lymphomas, underscoring the need for improved diagnostic methods. Here, we developed an expression-based classifier that improves the subtyping and clinical diagnostics of PTCL.
Methods
A machine-learning (ML)-based classifier was trained using the LightGBM algorithm. The classifier was trained and validated on multiple internal and public RNA-seq and microarray datasets, including major PTCL entities. A total of 198 nodal T-follicular helper (TFH) cell lymphomas (nTFHL), 139 anaplastic large cell lymphomas (ALCL), 227 primary cutaneous T-cell lymphomas (CTCL), 140 extranodal NK/T-cell lymphomas (ENKTL) and 413 PTCL-NOS were investigated (Horwitz et al., Nat Med., 2024; Iqbal et al., Blood 2014; Iqbal et al., Blood 2010; Shin et al., Blood 2007).
To reduce the number of initial gene features, we selected differentially expressed genes for each model using the limma package from Bioconductor. Batch effect correction was performed using rank transformation of expression values to avoid batch-effect between cohorts. The training process included hyperparameters tuning, searching for an optimal set of gene features via the SHAP approach ultimately aiming to maximize the predictive performance of the subtyping models. Finally, the optimal probability threshold for each model was determined through ROC curve analysis on cross-validation test fold predictions during training.
Results
The classifier's architecture led to four independent binary models, each predicting one PTCL subtype or class ‘Other’. Thus, each model corresponded to one specific umbrella diagnosis: nTFHL, CTCL, ALCL, and ENKTL. PTCL-NOS samples were included in the training set as part of the ‘Other’ class, enhancing the models' specificity.
Validation metrics were obtained for each model using a test dataset excluding PTCL-NOS samples, anticipating that some samples could be reclassified and their true labels are unknown. The sensitivity and specificity were the following: 0.83 and 0.99for nTFHL, 0.96 and 0.98 for CTCL, 0.77 and 0.99 for ALCL, and 0.95 and 0.97 for ENKTL models. Biologically relevant genes were also included (e.g., CXCL13 and ICOS).
Next, the trained models were applied to classify PTCL-NOS samples. The classifier identified PTCL-NOS as nTFHL (24.2%; 75/310), ALCL (14.2%; 44/310) , CTCL (4.5%; 14/310) and ENKTL (2.9%; 9/310). A total of 163 out of 310 (52.6%) PTCL-NOS samples were not assigned to any specific class and retained their original classification. Only 1.6% (5/310) of the PTCL-NOS samples exhibited ambiguous predictions, fitting into multiple classes.
Both reclassified PTCL-NOS cases and samples belonging to specific diagnoses displayed similar gene expression patterns. Moreover, statistical analysis revealed no significant difference in the expression of CD30 between ALCL, where overexpression of CD30 is a hallmark of tumor cells, and relabeled PTCL-NOS samples as ALCL. Similarly, the expression of TFH cell markers in nTFHL was not distinguishable from nTFHL reclassified PTCL-NOS samples, supporting the overall robustness of the PTCL-NOS reclassification.
Conclusion
The developed expression-based ML classifier accurately predicted specific PTCL diagnoses further capable of reclassifying additional PTCL-NOS cases into specific subtypes (~50%). This new strategy provides a powerful diagnostic tool for routine clinical practice. Furthermore, the classifier includes gene features to support the implementation of targeted investigations and further improve the understanding of PTCL pathogenesis.
Sobol:BostonGene, Corp.: Current Employment. Kuznetsov:BostonGene, Corp.: Current Employment, Current equity holder in private company, Current holder of stock options in a privately-held company. Kremlev:BostonGene, Corp.: Current Employment. Inghirami:Daiichi Sankyo: Consultancy. Fowler:Roche/Genentech: Consultancy, Research Funding; TG Therapeutics: Consultancy, Research Funding; Bayer: Consultancy; Novartis: Consultancy, Research Funding; Verastem: Consultancy; Gilead: Research Funding; Abbvie: Research Funding; BeiGene: Research Funding; CelGene: Consultancy, Research Funding; BostonGene: Current Employment, Current equity holder in private company, Current holder of stock options in a privately-held company. Kotlov:BostonGene, Corp.: Current Employment, Current equity holder in private company, Current holder of stock options in a privately-held company, Patents & Royalties: BostonGene, Corp.. Nikitina:BostonGene, Corp.: Current Employment, Current equity holder in private company, Current holder of stock options in a privately-held company.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal